Collapsed qualifiers kg #268

beasleyjonm · 2024-10-31T17:44:32Z

Added a script to produce jsonl and neo4j dumps with the qualifiers collapsed into the edge predicate. This should be useful for internal rule mining and heterogenous graph embedding methods where we don't necessarily need to be Biolink compliant.

…icates.

…j dump.

EvanDietzMorris

Thanks @beasleyjonm, this looks great. I think we should go ahead and implement this in a way that would handle all qualifiers though. That way we don't need to worry about keeping this up to date when we inevitably have more, and it would include the couple others we already have.

I added a few comments here for how we could do that pretty easily.

EvanDietzMorris · 2024-11-01T03:52:00Z

Common/collapse_qualifiers.py

+# TODO - really we should get the full list of qualifiers from Common/biolink_constants.py,
+#  but because we currently cannot deduce the association types of edges and/or permissible value enumerators,
+#  we have to hard code qualifier handling anyway, we might as well check against a smaller list
+QUALIFIER_KEYS = [OBJECT_DIRECTION_QUALIFIER, OBJECT_ASPECT_QUALIFIER]
+# we do have these qualifiers but we cant do any redundancy with them so ignore for now:
+# QUALIFIED_PREDICATE -
+# SPECIES_CONTEXT_QUALIFIER -


These comments aren't as applicable to collapse_qualifiers as they were for the redundant graph. For the redundant graph we needed to be able to look up the ancestors of the value of the qualifier, but to do that you need to know the qualifier value enum_name, and I wasn't sure how to derive that given an edge from ORION, so we hard coded what we needed at the time (which is still not ideal). In this case, we only care about qualifiers and values which are actually on the edge, so we should be able to capture them all. We can do this dynamically easily using the biolink model toolkit.

Getting the full list of qualifiers from the constants file was a bad idea anyway, they shouldn't be hard coded if the biolink model toolkit can supply the current list. I'm not sure why I said that. :)

EvanDietzMorris · 2024-11-01T04:05:10Z

Common/collapse_qualifiers.py

+
+            # qualifiers = check_qualifier(edge) <- it would be better to do something like this but because we're not
+            # handling other qualifiers anyway it's faster to just do the following:
+            qualifiers = [qualifier for qualifier in QUALIFIER_KEYS if qualifier in edge]


instead of using this hard coded list, we can do something like the following (except you would want to instantiate the toolkit outside of the loop):

from Common.biolink_utils import get_biolink_model_toolkit
bmt = get_biolink_model_toolkit()
qualifiers = {key:value for key, value in edge.items() if bmt.is_qualifier(key)}

Then you could do something like the following, with a function that does your semantic transformations where applicable
for qualifier, qualifier_value in qualifiers.items():
qualifier_statement += semantic_adjustment(qualifier, qualifier_value)

Thanks for the suggestions. I've added a new commit that handles all of qualifiers. There is still some hard-coded decisions involved with the current qualifiers, but it will print a warning if new qualifiers that aren't handled here are ever found in edges.jsonl files in the future.

Looping through the qualifiers once you determine them like I mentioned would be better because it's more efficient (it would only attempt to handle the qualifiers that are actually there instead of checking if every possible qualifier is on every edge), and cleaner/more maintainable, mostly because the current implementation violates the DRY principle.

It would also allow you check against less things per qualifier using if/elif and remove the counting part:
if qualifier_key == qualifier_type_x:
...
elif qualifier_key == qualifier_type_y:
...
else:
print(qualifier_key not supported)

All that being said, it looks like it should all work and won't break or affect other parts of ORION, so it's fine with me to merge in if you'd like.

It would be even better to do a lookup of functions or something like that:

semantic_adjustments = {
qualifier_key_1: some_semantic_adjustment_function,
qualifier_key_2: some_other_semantic_adjustment_function,
}

if qualifier_key in semantic_adjustments:
adjusted_qualifier = semantic_adjustments[qualifier_key](qualifier_key, qualifier_value)

Jon-Michael Beasley added 2 commits October 31, 2024 11:30

Added script to collapse object qualifier statements to the edge pred…

0abf23c

…icates.

Updated to fix code and add option to create collapsed qualifier Neo4…

0839bcc

…j dump.

beasleyjonm requested a review from EvanDietzMorris October 31, 2024 17:44

EvanDietzMorris requested changes Nov 1, 2024

View reviewed changes

Update collapse_qualifiers.py

7bbdc00

EvanDietzMorris self-requested a review November 8, 2024 05:36

EvanDietzMorris approved these changes Nov 8, 2024

View reviewed changes

beasleyjonm merged commit 52149ee into master Nov 21, 2024
1 check passed

beasleyjonm deleted the collapsed_qualifiers_kg branch November 21, 2024 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collapsed qualifiers kg #268

Collapsed qualifiers kg #268

beasleyjonm commented Oct 31, 2024

EvanDietzMorris left a comment •

edited

Loading

EvanDietzMorris Nov 1, 2024

EvanDietzMorris Nov 1, 2024

beasleyjonm Nov 7, 2024

EvanDietzMorris Nov 8, 2024

EvanDietzMorris Nov 8, 2024

Collapsed qualifiers kg #268

Collapsed qualifiers kg #268

Conversation

beasleyjonm commented Oct 31, 2024

EvanDietzMorris left a comment • edited Loading

Choose a reason for hiding this comment

EvanDietzMorris Nov 1, 2024

Choose a reason for hiding this comment

EvanDietzMorris Nov 1, 2024

Choose a reason for hiding this comment

beasleyjonm Nov 7, 2024

Choose a reason for hiding this comment

EvanDietzMorris Nov 8, 2024

Choose a reason for hiding this comment

EvanDietzMorris Nov 8, 2024

Choose a reason for hiding this comment

EvanDietzMorris left a comment •

edited

Loading